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ABSTRACT 

With so much of our lives computerized, it is vitally 
important that machines and humans can understand 
one another and pass information back and forth. 
Mostly computers have things their way we have to & 
talk to them through relatively crude devices such as 
keyboards and mice so they can figure out what we 
want them to do. However, when it comes to 
processing more human kinds of information, like an 
old-fashioned printed book or a letter scribbled with a 
fountain pen, computers have to work much harder. 
That is where optical character recognition (OCR) 
comes in. Here we process the image, where we apply 
various pre-processing techniques like desk wing, 
binarization etc. and algorithms like Tesseract to 
recognize the characters and give us the final 
document. 

Keywords: Open CV- Python; Image Processing; Text 
Extraction; Image threshold; Virtual Image 

I. INTRODUCTION 

Text data present in images contain useful 
information for automatic annotation, indexing, and 
structuring of images. Extraction of this information 
involves detection, localization, tracking, extraction, 
enhancement, and recognition of the text from a given 
image. However, variations of text due to differences 
in size, style, orientation, and alignment, as well as 
low image contrast and complex background make 
the problem of automatic text extraction extremely 
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challenging. While comprehensive surveys of related 
problems such as face detection, document analysis, 
and image indexing can be found, the problem of text 
information extraction is not well surveyed. A large 
number of techniques have been proposed to address 
this problem, and the purpose of this paper is to 
classify and review these algorithms, discuss 
benchmark data and performance evaluation, and to 
point out promising directions for future research. 

Content-based image indexing refers to the process of 
attaching labels to images based on their content. 
Image content can be divided into two main 
categories: perceptual content and semantic content. 
Perceptual content includes attributes such as color, 
intensity, shape, texture, and their temporal changes, 
whereas semantic content means objects, events, and 
their relations. A number of studies on the use of 
relatively low-level perceptual content for image and 
video indexing have already been reported. Studies on 
semantic image content in the form of text, face, 
vehicle, and human action have also attracted some 
recent interest. Among them, text within an image is 
of particular interest as 

> It is very useful for describing the contents of 
an image; 

> It can be easily extracted compared to other 
semantic contents, and 

> It enables applications such as keyword-based 
image search, automatic video logging, and 
text-based image indexing. 
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II. TEXT IN IMAGES 

A variety of approaches to text information extraction 
(TIE) from images have been proposed for specific 
applications including page segmentation, address 
block location, license plate location, and content- 
based image indexing. 



Fig. 1: Grayscale document images 



Fig. 2: Multi-color document images 



Fig. 3: Images with caption text 



Fig. 4: Scene text images 


Text in images can exhibit many variations with 
respect to the properties like geometry, color, motion, 
edge and compression. 


Table 1: Properties of text in images 


Properties 

Variants or sub¬ 
classes 

Geometry 

Size 

Regularity in size of 
text 

Alignment 

Horizontal/vertical 

Straight line with 
skew (implies vertical 
direction) 

Curves 

3D perspective 
distortion 

Inter¬ 

character 

distance 

Aggregation of 
characters with 
uniform distance 

Colour 

Gray 

Colour (monochrome, 
polychrome) 

Motion 

Static 

Einear Movement 

2D rigid constrained 
movement 

3D rigid constrained 
movement 

Free Movement 

Edge 

Strong edges 
(contrast) at text 
boundaries 

Compression 

Un-compressed image 

JPEG, MPEG- 
compressed image 


The problem of Text Information Extraction TIE 
system receives an input in the form of a still image or 
a sequence of images. The images can be in gray scale 
or color, compressed or un-compressed, and the text 
in the images may or may not move. The TIE problem 
can be divided into the following sub-problems: (i) 
detection, (ii) localization, (iii) tracking, (iv) 
extraction and enhancement (v) Optical Character 
recognition (OCR). 
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III. IMAGE THRESHOLDING D. Threshold to Zero 


A. Threshold Binary 




This operation can be expressed as: 


This thresholding operation can be expressed as: 

. , f maxVal if srcfx,!)) > thresh 

dst x,y - < - 

[ 0 otherwise 

So, if the intensity of the pixel stc( x.„ tj i i s higher than 
thresh, then the new pixel intensity is set to a 
MaxVal. Otherwise, the pixels are set to 0 . 


, i x f 3rc(x,\jl if srcrx.li] > thresh 

If thresh is lower thanthfCSh, the new pixel value will 
be set to(\ 

E. Threshold to Zero, Inverted 


B. Threshold Binary, Inverted 




■ 





1 










Fig. 5: Threshold Binary, Inverted 


This thresholding operation can be expressed as: 


dstfx.yl 


JO if srcfx,y) > thresh 

1 maxVal otherwise 



Fig. 8: Threshold to Zero, Inverted 


This operation can be expressed as: 


dst(x, 



0 

srcfx.lj) 


if src(x,y) > thresh 
otherwise 


, ■, If thresh is greater than thresh, the new pixel value 

If the intensity of the pixel srclx^J j s higher than will be set to 0. 

IhresK, then the new pixel intensity is set to aO. 

Otherwise, it is set to MaxVai. 


C. 


Truncate 



Fig. 6: Truncate 

This thresholding operation can be expressed as: 


dst(x, y) 


threshold if src[x,y) > thresh 
src(x,y) otherwise 


The maximum intensity value for the pixels 

thresh , 'f src{x ’ yl is greater, then its value 
truncated. See figure below: 


F. Simple Thresholding 

If pixel value is greater than a threshold value, it is 
assigned one value (may be white), else it is assigned 
another value (may be black). The function used is 
cv2.threshold. First argument is the source image, 
which should be a grayscale image. Second argument 
is the threshold value which is used to classify the pixel 
values. Third argument is the maxVal which represents 
the value to be given if pixel value is more than 
(sometimes less than) the threshold value. OpenCV 
provides different styles of thresholding and it is 
is decided by the fourth parameter of the function. 
• Different types are: 

> cv2.THRESH_BINARY 

> cv2.THRESH_BINARY_INV 
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> cv2.THRESH_TRUNC 

> c v2. THRE S H_T OZERO 

> cv2.THRESH_TOZERO_INV 

Two outputs are obtained. First one is a retval. Second 
output is our thresholded image. 



Fig. 9: Image Describing outputs of different 
Thresholding techniques 


IV. PYTHON ANYWHERE 

Python Anywhere is an online Integrated 
Development Environment (IDE) and Web hosting 
service based on the Python programming language. It 
provides in browser access to server-based Python and 
Bash Co mm and-line interfaces, along with a code 
editor with Syntax highlighting. One striking different 
between Python Anywhere and the usual Python Cloud 
Computing solution that we know of, is that you can 
totally work on it online using internet browser in 
developing your Python application. With this, you can 
bypass the usual delicacies on preparing a local 
workstation that meet cloud hosting service 
environment requirement and directly work inside 
your browser that connected to many consoles 
provided by Python anywhere, such as : Bash, 
Python/iPython 2.6/2.7/33 and MySQL. 

This provides a step-by-step guide on how to deploy 
your Django applications. The service provides in¬ 
browser access to the server-based Python and Bash 
command line interfaces, meaning you can interact 
with Python Anywhere’s servers just like you would 
with a regular terminal instance on your own 
computer. Currently, Python Anywhere are offering a 
free account which sets you up with an adequate 
amount of storage space and CPU time to get a Django 
application up and running. 


A. Creating a Python Anywhere Account 

First sign up for a Beginner Python Anywhere 
account. If your application takes off and becomes 
popular, you can always upgrade your account at a 
later stage to gain more storage space and CPU time 
along with a number of other benefits (like hosting 
specific domains and ssh abilities). 

Once your account has been created, you will have 
your own little slice of the World Wide Web at 
http://<usemame>.pythonanywhere.com, where 

<usemame> is your Python Anywhere username. It is 
from this URL that your hosted application will be 
available from. 

B. The Python Anywhere Web Interface 

The Python Anywhere web interface contains a 
dashboard, which in turn provides a series of tabs 
allowing you to manage your application. The tabs as 
illustrated in Fig. 10 include: 

• a consoles tab, allowing you to create and 
interact with Python and Bash console 
instances; 

• a fdes tab, which allows you to upload to and 
organize files within your disk quota; 

• a web tab, allowing you to configure settings 
for your hosted web application; 

• a schedule tab, allowing you to setup tasks to 
be executed at particular times; and 

• a databases tab, which allows you to configure 
a MySQL instance for your applications should 
you require it. 

Of the five tabs provided, we’ll be working primarily 
with the consoles and web tabs. The Python Anywhere 
help pages provide a series of detailed explanations on 
how to use the other tabs. 


6 0 0 Consoles: soneboJy: Pv xt 


f C j http5://www.pvthonanvwhe'E.ccm/user/sofnebcdy/con5oles/ 

4,9 = 

5b f 

TOpythonanywhere serdwback 

Furjrrs Hep 3 log Deshooard Axarr. Logout | 

Consoles Files Web Schedule Databases 


Start a new console; 


Py.hon: nlWli IPyta(D.13): lUUm PyPy: 2,7 

Oto jsed |D.<Ms o/ you ICO second CPU ataraxel 

Ota Sash 1 MySQL 

Allowance -esets it 20 hois, E0 m nutas 

Your consoles: 


to to# no consoles. Oick a H atxw# to sian one. 



Fig. 10: The Python Anywhere dashboard, showing 
the Consoles tab. 
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C. Python Anywhere to upload the image 


I pythonanywhere 

Consoles Files Web Schedule Databases 

/ home / St kakaniS 


Send feedback Forums Help Blog Dashboard Account Logout 


1 Open Bash console here 4 % full ( 19.6 MB of your 512.0 MB quota) 


Directories Files 

Enter new directory name | New directory Enter new file name, eg hello.py New file 


.cache/ i l.bashrc iQ'f 20i7«-tii623 5S9b»m 

.local/ i l.gitconfig ieimM-tiiKB Mbytes 

.virtualenvs/ i I profile AEl 2017-04-1116:23 79byles 

mysite/ i I pythonstartup.py let 201704-1116:23 77 bytes 

| vimrc ABU 2017 - 04 - 111623 « kb 

I README-tXt ABl 2017-04-11 16:23 23Sbytes 


Here we specify the file we wish to run. Python is the 
keyword to specify that we are running a python file 
and testest.py is the file name. 

E. Result File 


^jfpythonony where 



I THE LAST ClA 

3 THU PARC® »€« ' 

4 IS STILL m:SS«G 


Fig. 11: Python Anywhere IDE to upload image 

In the Python Anywhere IDE, the user can upload the 
image from which he or she wishes to extract the text. 
After logging into Python Anywhere account, a user 
has to go to the working directory where one can find 
“Upload a File” option. Clicking on it lets user chose 
the desired image and then uploads it to Python 
Anywhere cloud. 

D. The Bash Console 

anywhere 

Consoles Files Web Schedule Databases 

Start a new candle; 

pytfttt: 3.4/1-3/27/2.6 IPyllifiri: 3.4/3.3/2.7/26 PyPy: 2.7 
Ollier. Baal- I MySQL 
CuSMirt: ft 

Fig. 12: Finding Bash Console in Python Anywhere 

Python Anywhere allows a user to have two consoles 
for a free trial. On upgrading the account, a user can 
increase this number. To run the python files one must 
open the bash console. 

$\ y 

^pythonanywhere 
Bash console 5015192 


10:14 -/mysite l python testest.py 
here 

10:14 -/mysite l 


Fig. 13: Running Files in Bash Console 


Fig. 14: Text files containing extracted text 

The text extracted from the images is pipelined to a 
text file where the user can view, edit and modify its 
contents. User can thus save the obtained text file and 
download it from Python Anywhere. 

V. SYSTEM ANALYSIS 
A. System Architecture 

The entire process can be depicted using these basic 
steps: 



Fig. 15: Workflow in the system 

The three basic steps involved in this process are 
detection, enhancement and extraction. This diagram 
defines the structure of the system. 
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Fig. 16: Detailed Architecture of system 

VI. Test cases 


10 

Complex 
background 
image with 

tilted text 

containing 
mixed colours 

Text 

extracted 

Text 

extracted 

Failed 

11 

Label on water 
bottle 

Text 

extracted 

Text 

extracted 

Failed 


Below are the results of few test cases performed. The 
original image and the extracted text are shown 
below. 

A. Example 1: 

| ABSTRACT 

HT.NO: 13071A05H8 Name: V Rohith 


Table 2: Test Cases 


Learning words from pictures 

-System correlates recorded speech withi images, could 
lead to fully automated speech) recognition. 


S. 

No 

Test Case 

Expected 

Result 

Actual 

Result 

1 

Image with 

plain Text and 
plain 

background 

Text 

extracted 

Text 

extracted 

Passed 

2 

Image with 
luminance 

Text 

extracted 

Text 

extracted 

Passed 

3 

Tabular data 
which contains 
the rows and 
columns 

Text 

extracted 

Text 

extracted 

Passed 

4 

Letter head 

Text 

extracted 

Text 

extracted 

Passed 

5 

Bond paper 

with the text 
content which 
is in colour 

Text 

extracted 

Text 

extracted 

Passed 

6 

Signboard 
containing text 

Text 

extracted 

Text 

extracted 

Passed 

7 

Text with 

varying font 
size 

Text 

extracted 

Text 

extracted 

Passed 

8 

Handwritten 

text 

Text 

extracted 


Partiall 

y 

passed 

9 

Image with 

high text data 
of low details 

Text 

extracted 

Text 

extracted 

Failed 


Speech recognition systems, such as those that convert speech to text on cell 
phones, are generally the result of machine learning A computer pores through 
thousands or even millions of audio files and their transcriptions, and learns which 
acoustic features correspond to which typed words But transcribing recordings is 
costly, time-consuming work, which has limited speech recognition to a small 
subset of languages spoken in wealthy nations 

The goal of this work is to try to get the machine to learn language more like the 
way humans do New’ approach to training speech-recognition systems are that 
doesn't depend on transcription. Instead, their system analyzes correspondences 
between images and spoken descriptions of those images, as captured in a large 
collection of audio recordings The system then learns which acoustic features of 
the recordings correlate with which image characteristics 

Xerging modalities 


Fig. 17: Image 



Fig. 18: Image with plain background 
B. Example 2: 



Fig. 19: Bond Paper 


@ IJTSRD | Available Online @ www.ijtsrd.com | Volume - 1 | Issue-6 |Sep-Oct2017 


Page: 315 








































































International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470 


' ngvtesseract imageilaislrinfl\mage.open('c:/Users/bkkakanw/Oocumenls/pmj«ct/pytcsseractr 
10:55 ~/mysite J python testest.py 


5 NAKA MANIJMAMT NAGAR, FAIZABAD (U.P.) PIN-224 001 in the 
Of CHAIRMAN, OUtART ENVIRONMENTAL EDUCATIONAL SOCIA1 
ARE TRUST & MANAGER of DU LARI MAI - [ ILA MAHAVIDYAL AYA, 

DULARI NAGAR, PILAI, AMBEDKAR NAGAR (U.P.) PIN-224 168 take oath and 
state as under:- 


That I am Manager of DULARI MANILA MAHAVIDYALAYA, DULARI 
NAGAR, PILAI. AMBEDKAR NAGAR (U.P.) PIN-224 168 and signing this 
Affidavit on behalf of it for which I am fully authorized 

That the managing society/trust has made an application to the NRC, NCTE for running 

an institution namely DULARI MANILA MAHAVIDYAL AYA, DULARI 

NAGAR, PlLAI. AMBEDKAR NAGAR (U.I*.) PIN-224 168 for B.Ed. with an 

intake of lOO Seat. The institution alter fulfilling all the provisions of the NCTE Act, 
Rules & Regulations has got letter of intent under section 7(9) of the NCTE Regulations, 

2009 vide NRC, NCTE letter No. F.Na. NRC/NCTEINRCAPP-729M225”1 
wieeting/75088 dated 07 March, 2014. / 

i That the managing society/trust of the institution has constituted the Selection 

Committee for appointment of faculty as per the policy of the State Government/ 
University/UGC and the following were the Members ofthe Committee:- 


06:03 'Vmysite I python testest.py 
here 

e"‘la,Vc:: been ckosm 5:201 *Une xole 07f Q$5odcd£ 
L\- ,, -:2VSUEtQfl£ COL rrriCfliOwH C0YF0TGL T UOTL ("as 
<:; r3-<\;si£ment Pmccss wank on *FM’ {we dcu 

{6) pf 06 Allans’sp .2016. The job deecwif 
tggm Sta-Leg “the, Package, 4,0 be CM’OLLHA, 

[QKA Pm’ monum- 

06:03 ~/mysite t _ 


Fig. 24: Image with high text data of low details 


Fig. 20: Bond Paper with plain background 


C. Example 3: 


H » Re dcti r, 

31 conne C*cr-c>e>s 5 >nt»reclcli t 

rocia v cioci x tcve, to tvanci 

^ r 1 1 e - T- f tvougt) t X’d Stvor ci nr»v 
° w ‘^ pei^orvorkst-up • t>d»s 
rxcrnnal , e ver \/C\ov t^oocl wri r iocj 
^ 've w i t tc rv »o is s t>y ic for 

severa i vcar^ oiow. x tt^ot 

i + 'S quicK to write. oe.a t. aod 
eoe>v to read. 


It HUE'S 


} ucct-fc cxjjlczL 


IvrltLtC. i-ltLL 

^ - S £ . • I . r>7Vl^/ . s~ t f 


LiJkjcLs to txxVLtto -ti\. 
cLcUnxtcUi not CL/S- 
} no futS clciXctxt Lt to 
Lott, jutJL l. cl iv 




xcL 

lA ^tcOv-txi 


Fig. 21: Hand writing Image 



Fig. 22: Image with plain background 


E. Example 5: 



Fig. 25: Complex background image with tilted text 
containing mixed colors 


C : \Py1= hon3-4->python testest . py 
h± 

QED B 553$. (Ed $5.9m 

at... 53> J 55353. 53 "c5 nr$ _ >c. , : rn : 1:. I 3 

C : \Python34>python lies test . py 
hi 

l.j r f ook at success that can transform yam tife J 
Dame ! Pink, author 1 of Drtx/e 


3: how»; Success" course 
C:\Python34> 


Fig. 25: Complex background image with tilted text 
containing mixed colors with plain background 


D. Example 4: 

V: :-: .'.iiv-r Alie-lx be . J . u .txi l.-iv .ixi lC-d ■_xj.ca c v-wi™! 

eJriti--- ■Jk.lxieu r*d:'-Hii_ Aikfc* A. J-. 1 h-r hzid-n A.4-.4r ^hkdlial >rt» .iL A-I-x- 

l^lh- rp.i r- I >1 n^p. 3 . KMipi k<Hb. I '.i 

rtiJ biA x> ^rl.l I > i::v , l^ — i- .. ■ .-i. ■■ — -i .... i ■ --i . . -.j. 1. 

Irc^iul'J n ih- nl m- mi I* *A 71 tiAf«i it 

Mlmr ir [ i«u> l l I Vr.i^fi IT k ■ k-. rl V». luf XI ■ r-> ■. 4# I i -Ini fl^r-k **il 

rI . LUC a---i HIJi Mil'll ■ - L > 1 ! 7v- ‘i FI kllE Ir.-I. A - n.t>:i ZO r >|f| I- 

— | «rn t-tJ r- > m I riff 1 ’!>■■■ fill i‘s- UTiiTi "Al i> ■ ii'; ur * 1 r h-V-iHk 

-ad Li.1 hi :n ■ ..c:>: 5 tc>- :-n: : x 3 ^lv3_ll :■ riooKrd e_£-i__i ■ 51 a 

iae m ivi Luce ■iixi ii_l -Lii ::"rxc: . Xii. □ ■ ■r=. , >n _uk 

"uT 'Jr-ni V- : 1 -■ j ri ■! ■ ■ I cL-_ j: v. r J v, iJr '■'I'-r? bjlt' i_i ■.: c« '■« ■_ b: 

:i?>t .nr ' tmi Vr!d ■ ■ -- ! «■ tr Hr‘ 1 i 1 I irrt a— 

nfrnE'T- jaJ w ilk indlhKhi-K - rj- cf 1-n^-- viir 

pr ■H'-lxbl e| JdYinlrtPfl FH'T T, ' C ■ Lr-r^ r t .1 '5 -J- ■ ■- 

■rji i... » r iTi-.'n ■*! i- • i'.t i ■ 'i ■< -■ ". | I -i I ‘ ■ i - I '• i m - 

I It- f I ■ ■ ,fc _, .| ■ I I I r.i }■ i Vi ■■ = ■ i IF ' : I I i I lI. Ill ■■ ’.I r ■ i IS. i i I MI 

itw Kr'l' f^i'i^f, iffli-j 1 


Fig. 23: Image with high text data of low details 


CONCLUSION 

Even though a large number of algorithms have been 
proposed in the literature, no single method can 
provide satisfactory performance in all the 
applications due to the large variations in character 
font, size, texture, color, etc. Through this paper we 
are in the stream of deriving the satisfactory results 
by enhancing the input by fine tuning the image and 
deriving the optimum levels of accuracy from 
TESSERACT. 
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FUTURE SCOPE 6) 


With machine learning algorithms constantly being 
developed and improved, massive amounts of 
computational power becoming readily available both 
locally and on the cloud, and unfathomable amounts 7 ) 
of data can be extracted not only in the domain of 
image but also in terms of scene, video frames and 
scrolling types of data. 
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